Overview of ggplot2

ggplot2 is a very useful and used package for data visualization in the R community. It implements the “Grammar of Graphics” approach. There are plenty of useful free resources online:

Let’s import the OpenPowerlifting dataset taken from Kaggle:

powerlift_data <- read.csv("./data/openpowerlifting2.csv", sep = ";")
str(powerlift_data)
## 'data.frame':    117463 obs. of  11 variables:
##  $ Name      : chr  "Adrian Zwaan" "Aiden Westrip" "Andrew Fella" "Andrew Yuile" ...
##  $ Sex       : chr  "M" "M" "M" "M" ...
##  $ Event     : chr  "SBD" "SBD" "SBD" "SBD" ...
##  $ Age       : num  80 28 27 36 34 26 27 32 31 28 ...
##  $ Bodyweight: num  82.1 82 89.2 79.5 114.7 ...
##  $ Squat     : num  100 228 260 125 270 ...
##  $ Bench     : num  72.5 135 140 77.5 180 ...
##  $ Deadlift  : num  145 250 250 142 270 ...
##  $ Total     : num  318 612 650 345 720 ...
##  $ Federation: chr  "GPC-AUS" "GPC-AUS" "GPC-AUS" "GPC-AUS" ...
##  $ Date      : chr  "27/10/2018" "27/10/2018" "27/10/2018" "27/10/2018" ...

Our data set contains results and descriptive data from pretty strong people.

head(powerlift_data)
##               Name Sex Event Age Bodyweight Squat Bench Deadlift Total
## 1     Adrian Zwaan   M   SBD  80       82.1 100.0  72.5    145.0 317.5
## 2    Aiden Westrip   M   SBD  28       82.0 227.5 135.0    250.0 612.5
## 3     Andrew Fella   M   SBD  27       89.2 260.0 140.0    250.0 650.0
## 4     Andrew Yuile   M   SBD  36       79.5 125.0  77.5    142.5 345.0
## 5 Anthony Provenza   M   SBD  34      114.7 270.0 180.0    270.0 720.0
## 6  Arian Behbehani   M   SBD  26       97.6 300.0 167.5    282.5 750.0
##   Federation       Date
## 1    GPC-AUS 27/10/2018
## 2    GPC-AUS 27/10/2018
## 3    GPC-AUS 27/10/2018
## 4    GPC-AUS 27/10/2018
## 5    GPC-AUS 27/10/2018
## 6    GPC-AUS 27/10/2018

Let’s explore the main components of a ggplot object (data, aesthetic mapping, geometries) by looking at the relationship between body weight and total score.

powerlift_data %>% 
    ggplot(aes(x = Bodyweight, y = Total)) +
    geom_point() +
    labs(x = "Bodyweight (Kg)", y = "Total Score (Kg)")
## Warning: Removed 11133 rows containing missing values (geom_point).

Let’s add some transparency:

powerlift_data %>% 
    ggplot(aes(x = Bodyweight, y = Total)) +
    geom_point(alpha=0.05) +
    labs(x = "Bodyweight (Kg)", y = "Total Score (Kg)")
## Warning: Removed 11133 rows containing missing values (geom_point).

Modifying aesthetics and adding layers is a piece of cake ! Let’s get some color maps on the federations:

powerlift_data %>% 
    ggplot(aes(x = Bodyweight, y = Total)) +
    geom_point(aes(color = Federation)) +
    labs(x = "Bodyweight (Kg)", y = "Total Score (Kg)")
## Warning: Removed 11133 rows containing missing values (geom_point).

Although an histogram is a better approach to count the number of powerlifters in each federation.

ggplot(powerlift_data, aes(Federation)) +
    geom_bar() +
    labs(x = "Federation", y = "Number of People")

Shall we arrange the bars by count?

federation_counts <- powerlift_data %>%
    distinct(Name, .keep_all = TRUE) %>%
    group_by(Federation) %>%
    summarise(count = n()) %>%
    arrange(desc(count))

federations_plot <- federation_counts |>
    # filter(count > 500) |>
    ggplot(aes(reorder(Federation, -count), count)) +
    geom_bar(stat = "identity") +
    labs(
        x = "Federation",
        y = "Number of People"
    )

federations_plot

Let’s improve axis text visibility…

federations_plot +
    theme(axis.text.x = element_text(angle = 90))

Even more..

fedplot3 <- federation_counts |>
    filter(count > 500) |>
    ggplot(aes(reorder(Federation, -count), count)) +
    geom_bar(stat = "identity") +
    labs(
        x = "Federation",
        y = "Number of People"
    ) +
  theme(axis.text.x = element_text(angle = 90))

fedplot3

There is a wide range of customization options available in ggplot2. Axes, labels, titles, and legends can be easily modified. Use themes to create consistent and visually appealing visualizations

fedplot3 + theme_minimal()

Faceting is also an interesting approach very easy to follow in ggplot2 :

powerlift_data %>% 
  pivot_longer(cols = c("Squat", "Bench", "Deadlift"), names_to = "exercise") %>% 
  ggplot(aes(x=Bodyweight, y=value))+
  geom_point()+
  facet_grid(.~exercise)
## Warning: Removed 28422 rows containing missing values (geom_point).

powerlift_data %>% 
  pivot_longer(cols = c("Squat", "Bench", "Deadlift"), names_to = "exercise") %>% 
  filter(value > 0) %>% 
  ggplot(aes(x=Bodyweight, y=value))+
  geom_point(alpha = 0.05)+
  facet_grid(.~exercise) +
  labs(y="Score (Kg)")
## Warning: Removed 8202 rows containing missing values (geom_point).

What if we need some stats layers?

powerlift_data %>% 
  pivot_longer(cols = c("Squat", "Bench", "Deadlift"), names_to = "exercise") %>% 
  filter(value > 0) %>% 
  ggplot(aes(x=Bodyweight, y=value))+
  geom_point(alpha = 0.05)+
  geom_smooth()+
  facet_grid(.~exercise) +
  labs(y="Score (Kg)")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 8202 rows containing non-finite values (stat_smooth).
## Warning: Removed 8202 rows containing missing values (geom_point).

Another example of aesthetic manipulation: squat vs. becnh press performance for a given Federation.

powerlift_data %>% 
  filter(Federation =="NASA" & Squat > 0 & Bench > 0) %>% 
  ggplot(aes(x=Bench, y=Squat))+
  geom_point(aes(size=Age))
## Warning: Removed 2657 rows containing missing values (geom_point).

ggplot2 is primarily designed for static visualizations. It’s tricky incorporating interactive elements using ggplot2 alone. Although the magic ggplotly function deserves attention.

plotly::ggplotly()

When multiple comparisons and iterations are needed, ggplot may be used in combination of Shiny to optimize interactions.

powerlift_data %>%
  filter(Squat > 0 & Bench > 0) %>% 
  ggplot(aes(x=Bench, y=Squat))+
  geom_point()+
  facet_wrap("Federation")